46 research outputs found

    Modeling the interface between morphology and syntax in data-driven dependency parsing

    Get PDF
    When people formulate sentences in a language, they follow a set of rules specific to that language that defines how words must be put together in order to express the intended meaning. These rules are called the grammar of the language. Languages have essentially two ways of encoding grammatical information: word order or word form. English uses primarily word order to encode different meanings, but many other languages change the form of the words themselves to express their grammatical function in the sentence. These languages are commonly subsumed under the term morphologically rich languages. Parsing is the automatic process for predicting the grammatical structure of a sentence. Since grammatical structure guides the way we understand sentences, parsing is a key component in computer programs that try to automatically understand what people say and write. This dissertation is about parsing and specifically about parsing languages with a rich morphology, which encode grammatical information in the form of words. Today’s parsing models for automatic parsing were developed for English and achieve good results on this language. However, when applied to other languages, a significant drop in performance is usually observed. The standard model for parsing is a pipeline model that separates the parsing process into different steps, in particular it separates the morphological analysis, i.e. the analysis of word forms, from the actual parsing step. This dissertation argues that this separation is one of the reasons for the performance drop of standard parsers when applied to other languages than English. An analysis is presented that exposes the connection between the morphological system of a language and the errors of a standard parsing model. In a second series of experiments, we show that knowledge about the syntactic structure of sentence can support the prediction of morphological information. We then argue for an alternative approach that models morphological analysis and syntactic analysis jointly instead of separating them. We support this argumentation with empirical evidence by implementing two parsers that model the relationship between morphology and syntax in two different but complementary ways

    Hard constraints for grammatical function labelling

    Get PDF
    For languages with (semi-) free word order (such as German), labelling grammatical functions on top of phrase-structural constituent analyses is crucial for making them interpretable. Unfortunately, most statistical classifiers consider only local information for function labelling and fail to capture important restrictions on the distribution of core argument functions such as subject, object etc., namely that there is at most one subject (etc.) per clause. We augment a statistical classifier with an integer linear program imposing hard linguistic constraints on the solution space output by the classifier, capturing global distributional restrictions. We show that this improves labelling quality, in particular for argument grammatical functions, in an intrinsic evaluation, and, importantly, grammar coverage for treebankbased (Lexical-Functional) grammar acquisition and parsing, in an extrinsic evaluation

    Overview of the SPMRL 2013 shared task: cross-framework evaluation of parsing morphologically rich languages

    Get PDF
    This paper reports on the first shared task on statistical parsing of morphologically rich languages (MRLs). The task features data sets from nine languages, each available both in constituency and dependency annotation. We report on the preparation of the data sets, on the proposed parsing scenarios, and on the evaluation metrics for parsing MRLs given different representation types. We present and analyze parsing results obtained by the task participants, and then provide an analysis and comparison of the parsers across languages and frameworks, reported for gold input as well as more realistic parsing scenarios

    Overview of the SPMRL 2013 Shared Task: A Cross-Framework Evaluation of Parsing Morphologically Rich Languages

    Get PDF
    International audienceThis paper reports on the first shared task on statistical parsing of morphologically rich lan- guages (MRLs). The task features data sets from nine languages, each available both in constituency and dependency annotation. We report on the preparation of the data sets, on the proposed parsing scenarios, and on the eval- uation metrics for parsing MRLs given dif- ferent representation types. We present and analyze parsing results obtained by the task participants, and then provide an analysis and comparison of the parsers across languages and frameworks, reported for gold input as well as more realistic parsing scenarios

    Relatório de estágio em farmácia comunitária

    Get PDF
    Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr

    Finite-state rule deduction for parsing non-constituent coordination

    No full text
    Zarrieß S, Seeker W. Finite-state rule deduction for parsing non-constituent coordination. Finite State Methods and Natural Language Processing. 2008:213

    Hard constraints for grammatical function labelling

    Get PDF
    For languages with (semi-) free word order (such as German), labelling grammatical functions on top of phrase-structural constituent analyses is crucial for making them interpretable. Unfortunately, most statistical classifiers consider only local information for function labelling and fail to capture important restrictions on the distribution of core argument functions such as subject, object etc., namely that there is at most one subject (etc.) per clause. We augment a statistical classifier with an integer linear program imposing hard linguistic constraints on the solution space output by the classifier, capturing global distributional restrictions. We show that this improves labelling quality, in particular for argument grammatical functions, in an intrinsic evaluation, and, importantly, grammar coverage for treebankbased (Lexical-Functional) grammar acquisition and parsing, in an extrinsic evaluation
    corecore